The Scottish Qualifications Authority owns the copyright to its exam papers and marking instructions.

Paper 1

Question 1

1a) Hint 1: Note the 'in this context' statement in the question. This means that you have to be clear about the assumptions and make sure that they relate to 'filler sounds'

1a) Hint 2: Looking back at line 12, note that it mentions 'mean rate of filler sounds spoken in a fixed interval'

1a) Hint 3: Make sure that your assumptions clearly aligns with the occurence of filler sounds spoken, and not the actual filler sounds themselves

1a) Hint 4: The three assumptions for Poisson models are: independence of events occuring; no two events happening at the same time; a fixed mean rate of occurences in a specified interval

1b) Hint 5: Note the use of 'an improvement' so they are seeking just one improvement, and not more than one.

1b) Hint 6: The three charts are side-by-side, so a reader's eye is drawn across them and ought to notice that the numbers on the vertical axes are different

1b) Hint 7: The simplest thing to improve, therefore, is to rescale the vertical axes to all have the same maximum value, thereby making the charts y-axis have comparable 'heights'

1b) Hint 8: An extra improvement would be to ensure the horizontal axes are also all similar, so the charts are then even more comparable, no matter how they were located on a page.

1c) Hint 9: It's important to realise that Table 1 does not summarise 8 pieces of data. It is a frequency table that summarises 28 time periods

1c) Hint 10: Row by row, the table says that there were no zeros, three ones, six twos, six threes, five fours, five fives, two sixes and one seven

1c) Hint 11: Imagine writing out all 28 numbers and then adding them up. Given that there are multiples of each value, there would be a quick way to calculate that total.

1c) Hint 12: The mean is the total value of all 28 numbers, divided by 28.

1c) Hint 13: If you actually read ahead to the next part of the question, just before part (d), you will see that the exact value that you just calculated ought to round to 3.5

1d) Hint 14: All of the observed frequencies sum to 28. Therefore, all of the expected frequencies should also sum to 28.

1d) Hint 15: You can check that the sum of the expected frequencies, using 6.94, does not sum to 28.

1d) Hint 16: This shortfall has arisen from the Poisson distribution having no upper limit on the number of occurences of filler sounds

1d) Hint 17: Calculating the probability of P(X ≥ 5), where X ~ Po(3.5) can be used to obtain the expected frequency for the '5+' row of the table

1e) Hint 18: Table 2 now has 6 rows, so they are the 6 categories.

1e) Hint 19: Think of how many constraints those rows have on them ....

1e) Hint 20: ... one constraint is that the sums of the frequencies must add to 28

1e) Hint 21: ... a second constraint is that the mean of the values must equal 3.5 (or the exact value, obtained in part (c))

1e) Hint 22: Therefore there are two constraints, as we had to estimate the mean rate parameter for the Poisson distribution, using the observed data

1e) Hint 23: degrees of freedom = categories — constraints = 6 — 2

1f)i) Hint 24: notice that the expected frequency of 0.85 is less than one

1f)i) Hint 25: page 6 of the Statistical Formulae and Tables lists the condition for E_i whereby 'none should be less than 1'

1f)i) Hint 26: In addition, the table's values also fail on the condition that '80% of the E_i should be at least 5'

1f)ii) Hint 27: It is worth writing out Table 2 again, but with the first two rows combined, so that all of the values that you are going to use are clearly on display

1f)ii) Hint 28: Instead of 6.94, use the correct value obtained from part (d)

1f)ii) Hint 29: Calculate the chi-squared test statistic in the usual manner, taking care to not round any values during the process

1f)iii) Hint 30: Remember that null and alternative hypotheses must always clearly state the population parameter value(s) being testing for the distribution being considered

1f)iii) Hint 31: Refer back to your table from part (f)(ii) and see that it has only 5 rows, which are the 5 categories.

1f)iii) Hint 32: Refer back to the logic in part (e) and note that we still have 2 contraints

1f)iii) Hint 33: Therefore, be sure to use the correct number of degrees of freedom for the chi-squared distribution, using df = categories — constraints

1f)iii) Hint 34: After comparing your test statistic to the critical value (or calculating a p-value using a graphic calculator), state whether you would reject H₀, or not.

1f)iii) Hint 35: For the conclusion, include the phrase 'evidence to suggest' and be sure to write it in terms of the alternative hypothesis i.e. we have evidence to support H₁, or we have insufficient evidence to support H₁

1g) Hint 36: a two-sample z-test requires a normal distribution and the population variance to be known

1g) Hint 37: a two sample t-test requires a normal distribution and for the variances of the two populations to be equal

1g) Hint 38: therefore the common requirement for both of these tests is that they are based on an assumption of normal distributions

Question 2

2a) Hint 1: the degrees of freedom for a two sample t-test are equal to 'n — 2'

2a) Hint 2: identify the number of degrees of freedom from Output 1 and then add two

2b) Hint 3: Note the 'with reference to the context' statement in the question.

2b) Hint 4: The output only indicates that p < 0.0001 without specifying a value for it, so you will have to clearly communicate that you recognise that this is less than the 5% significance level

2b) Hint 5: Your conclusion has to say more than whether the true correlation is equal to zero or not, by including reference to the context of sprint times and hurdle times.

2c) Hint 6: Assumption about location requires reference to whether the residuals have a random scatter about the mean of zero, for each of the fitted values on the horizontal axis

2c) Hint 7: One way to judge this is to imagine splitting the residual plot up into a dozen or so 'vertical slices'. Move across the residual plot, looking at each 'slice' in turn, and decide whether each slice contains a random scatter of points on either side of the horizontal axis.

2c) Hint 8: Assumption about spread requires reference to whether the residuals have a constant variance.

2c) Hint 9: One way to judge this is to imagine splitting the residual plot up into a dozen or so 'vertical slices'. Move across the residual plot, looking at each 'slice' in turn, and decide whether each slice contains a similar spread of points compared to each of the other slices.

2c) Hint 10: If the spread of points within each slice remains constant, then the variance of the residuals is not dependent on the corresponding fitted values.

2d) Hint 11: Know that the standard linear reqression equation has the form y = a + b × x

2d) Hint 12: Note that in Output 2, we are given sprint = ***** + 0.9665 × hurdles

2d) Hint 13: Therefore 'y' = 'sprint', 'b' = 0.9665, 'x' = 'hurdles' and we just need to determine the value of the constant, 'a'

2d) Hint 14: Also in Output 2, we have the 'hurdle' value given as 13.09, and the 'fitted value' for the sprint given as 24.1366

2d) Hint 15: Substitute 24.1366 and 13.09 into sprint = a + 0.9665 × hurdles, and rearrange to solve for 'a'

2d) Hint 16: For the 99% Prediction Interval, the lower and upper bounds are always equidistant from the centre of the interval, which is the fitted value.

2d) Hint 17: We are given the lower bound value of 22.5224, and the centre value of 24.1366, so the difference between these can determine the 'half-width' of the interval

2d) Hint 18: Using either the lower bound value, or the centre value, and the newly found (half) width of the interval should support calculating the value of the upper bound

2e) Hint 19: page 5 of the Statistical Formulae and Tables lists the condition for ε_i, after where it says 'If additionally ...' and before the formulae for the two types of intervals.

2f) Hint 20: You should know that a Prediction Interval relates (in this context) to an individual time, and not a mean time

2f) Hint 21: Additionally, the '99%' part talks about how often you would expect the interval to capture the individual time. Therefore, talk about '99% of the time' or 'out of 100 sprints, it would expect to be captured 99 times'. Do not use vague phrasing like '99% certain' or '99% chance'

2g) Hint 22: Know that the model in Output 2 was set up to predict a sprint time from a given hurdle time.

2g) Hint 23: Notice that this is different from the question that refers to predicting a hurdle time from a sprint time.

2g) Hint 24: In simpler terms, we have a model that fits 'y' on 'x', and that is not the same as fitting 'x' on 'y'

2g) Hint 25: The way to resolve this is to recalulate the regression line with the variable swapped around, so that hurdle time can be predicted from a given sprint time.

2h) Hint 26: You should know that a Confidence Interval relates (in this context) to a mean time, and not an individual time

2h) Hint 27: Katerina's actual sprint time of 23.08 seconds was an individual time, and not a mean time.

2h) Hint 28: Therefore the confidence internal for mean times was not expected to capture individual times.

2h) Hint 29: Interestingly, it is reassuring to note that 23.08 is captured by the prediction interval from Output 2

Paper 2

Question 1

1a) Hint 1: Know that E(X) = Σ x P(X = x)

1a) Hint 2: Know that E(X²) = Σ x² P(X = x)

1a) Hint 3: Know that V(X) = E(X²) - E²(X)

1a) Hint 4: Use each of these formulae to obtain the values required.

1b) Hint 5: Know that the formula for the variance of a discrete uniform distribution is on page 4 of the Statistical Formulae and Tables

1b) Hint 6: Know that V(X — Y) = V(X) + V(Y) but only when the random variables X and Y are independent

1b) Hint 7: Know that SD(X — Y) = √(V(X — Y))

Question 2

2a) Hint 1: Know that Upper Fence = Q₃ + 1.5 × IQR

2a) Hint 2: Know that Lower Fence = Q₁ — 1.5 × IQR

2a) Hint 3: Use both formulae to calculate the fence values, and do not be distracted by the lower fence being a negative value. It's just a fence, not a data point.

2a) Hint 4: Comment on whether the number 1 is above or below the lower fence, and whether 46 is above or below the upper fence

2a) Hint 5: Be sure to quote numerical values to clearly convey what is being compared to what

2a) Hint 6: Do not refer to removing any data values due to them being possible outliers. Outliers are values that require further investigation, not values to be automatically removed.

2b) Hint 7: When writing hypotheses, the null hypothesis is generally 'there's nothing to see here'. In this context, that amounts to the two quantities not being associated. Alternatively phrased, the two quantities are independent.

2b) Hint 8: Decide on the level of significance that you will use, as it was not provided in the question, and clearly communicate your choice.

2b) Hint 9: Calculate expected frequencies, the test statistic, the required degrees of freedom and the critical value (or p-value) by your usual method.

2b) Hint 10: In your conclusion, be sure to write it in terms of the alternative hypothesis ie. whether or not there is any evidence to support an association between the two quantities

2b) Hint 11: You should find that there is evidence of an association. Therefore, you could examine the observed and expected frequencies to try to determine what that association might be. You might conjecture that customers appear to travel further for retail shops, and less far for food shops.

Question 3

Hint 1: Know that the appropriate approximation here is a normal approximation

Hint 2: Know that you will be approximating a discrete distribution with a continuous distribution, and so continuity correction will be needed

Hint 3: If X ~ Po(14) and Y ~ N(14, 14), then P(X > 20) ≈ P(Y > 20.5)

Hint 4: Standardise Y to Z to give P(Y > 20.5) = P(Z > (20.5 — 14)/√14 )

Question 4

4a) Hint 1: Know that typically 95% of values are expected to lie within 2 standard deviations of a mean, and that 99% of values are expected to lie within 3 standard deviations of the mean

4a) Hint 2: Know that money is generally a positive quanitity, and that in this context, negative values of tips-per-customer would not make sense

4a) Hint 3: The value that is 2 standard deviations below the mean (4.70 — 2 × 2.80) is negative, so this indicates that the model of a normal distribution does not seem appropriate for values in this region, or those that are more than 2 standard deviations below the mean

4a) Hint 4: Therefore the value of zero being a lower bound for tips-per-customer makes a normal distribution unlilkely to be appropriate.

4b)i) Hint 5: Note that the question includes 'state the distribution used', so care must be taken to clearly communicate both the random variable and its distribution

4b)i) Hint 6: The Central Limit Theorem is being used, so the random variable is the sample mean, or X̄

4b)i) Hint 7: The distribution of X̄ is approximately normal, and so it is best to write X̄ ≈ N(..., ...), and not just X̄ ~ N(..., ...)

4b)i) Hint 8: Know that as it is the sample mean, the variance is the (population standard deviation)² / n, where n is the sample size

4b)i) Hint 9: Use X̄ ≈ N(4.70, 2.80²/50) to calculate P(X̄ > 5.50)

4b)ii) Hint 10: Know that the Central Limit Theorem requires the sample size to be greater than 20

4b)ii) Hint 11: Know that the Central Limit Theorem is used when the distribution of the population is not known. In this context, we suspect - from part (a) - that the population of tips-per-customer is not normally distributed, and so it is appropriate to use the Central Limit Theorem.

Question 5

Hint 1: Know the standard procedure for conducting a Wilcoxon one-sample hypothesis test of the population median

Hint 2: Know that a difference value of zero is discarded (by convention), so this will reduce the sample size.

Hint 3: Look out for any tied ranks that must be dealt with appropriately.

Question 6

Hint 1: Two sample z-tests will involve μ₁, μ₂, σ₁, σ₂, n₁ and n₂. It is therefore important to clearly communicate which set of subscripted parameters are linked to which period of years; 2000 to 2009, or 2010 to 2018

Hint 2: When stating your hypotheses, again be clear which parameters are for which time period.

Hint 3: Conduct the standard test, and possibly know to expect a negative value for the test statistic, depending upon your choice of the ordering of the statistics and their corresponding parameters.

Hint 4: When phrasing the conclusion, be sure to reference back to the 'mean monthly extreme tidal range' context.

Question 7

7a) Hint 1: Recognise that there is a fixed number of trials (here, it is 10)

7a) Hint 2: Recognise that there is a constant probability of success (here, it is 1/6)

7a) Hint 3: Know that this means it is a Binomial distribution, B(10, 1/6)

7b) Hint 4: Recognise that the leftover string could be any length from 0.0cm to 8.0cm

7b) Hint 5: Recognise that every length in this range is equally likely to happen, as the original ball of string was of unknown length

7b) Hint 6: Know that this means it is a continuous uniform distribution, U(0, 8.0)

Question 8

8a) Hint 1: Note that the question implicitly prompts you to use 'n' and 'N' in your response

8a) Hint 2: What would you have to do with the 'N' individuals in the population?

8a) Hint 3: What would you use in order to ensure that it was a random sample that would be selected?

8a) Hint 4: How do you then know who is in the sample of 'n' from 'N'?

8b) Hint 5: Realise that you have been given two proportions, one from each centre.

8b) Hint 6: Know that within a two sample proportion test are two binomial distributions, each being approximated by a normal distribution.

8b) Hint 7: Know that you need to check that np > 5 and nq > 5 for each of the binomial distributions: be sure to calculate all of the values and clearly compare each of them to 5

8b) Hint 8: Top tip: when writing hypotheses, use the subscripts of 'A' and 'B' rather than '1' and '2', so that it's clear which values will be substuted into the formulae

8b) Hint 9: Perform a standard test, and conclude in context with reference to the claim being made

8c) Hint 10: Re-read the whole of the second paragraph, that starts 'A random sample of 29...'. Do you notice any detail of the data gathering that is relevant?

8c) Hint 11: Know that in order to compare two things, they ought to be captured under as similar conditions as possible.

Question 9

9a) Hint 1: Know your laws of expectation and variance: E(aX + b) = aE(X) + b, and V(aX + b) = a²V(X)

9b) Hint 2: Know that 'within 1 cm' means either 1 cm above, or 1cm below the required length

9b) Hint 3: Realise that we are seeking P(—1 < Y < 1), as Y is the error, which can be either positive or negative

9b) Hint 4: Proceed with calculating the probability, using the values of the mean and variance from part (a)

9b) Hint 5: Know that we have 80 'trials', each with the probability of 'success' of being within 1cm of the required length

9b) Hint 6: This is similar to working out the expectation of a binomial distribution, with n = 80 and p = the value just calculated

Question 10

10a) Hint 1: Note that the random variable is specifically about the mass of each spoonful, not the spoon itself

10a) Hint 2: Hence, make sure that the assumption makes reference to masses of spoonfuls, and not something that is less precise

10a) Hint 3: Know that the summary statistics can be used to calculate the sample standard deviation, using a selection of the formulae on page 4 of the Statistical Formulae and Tables

10a) Hint 4: Recognise that we have a small sample (n = 6) and only a sample standard deviation, rather than a population standard deviation.

10a) Hint 5: Know that Student's t distribution is the required basis for the test that will be performed.

10a) Hint 6: When concluding, make sure to reference the mean mass of spoonfuls

10b) Hint 7: Know that knowledge of the population variance, rather than estimating the variance from the sample, steers us away from the t-test and towards a z-test

Question 11

11a)i) Hint 1: Recognise that we have a mean, and a 2 sigma limit below the mean, and thus the limit line is 2 standard errors below the mean

11a)i) Hint 2: Therefore halving the difference between the mean and the 2-sigma limit gives you one standard error

11a)i) Hint 3: Use this one standard error to calculate the 1-sigma limit and the 3-sigma limit, using the value of 0.92 as the 'reference point'

11a)ii) Hint 4: Notice that the context of the question has been re-introduced and therefore what does it mean to have a high sample proportion of successes?

11a)ii) Hint 5: Consider whether having a high sample proportion is a good thing, or not a good thing.

11b) Hint 6: It is recommended to sketch out the p-chart with all of the 1, 2, 3 sigma lower limit lines, and then plot the values of 0.800, 0.833 and 0.815 on the p-chart

11b) Hint 7: Refer to page 4 of the Statistical Formulae and Tables and read the four Western Electric Company Rules

11b) Hint 8: Determine whether any of the rules might be broken, and what that might mean

Question 12

12a) Hint 1: Note that the context of 'bounce heights' are being measured, and not just 'bounces'

12a) Hint 2: When stating your two assumptions, be sure to include clear contextual descriptions that mention 'bounce heights' where appropriate

12a) Hint 3: Realise that we have a small sample (n = 15) and that the population variance is not known.

12a) Hint 4: Know that a confidence interval based around a t-distribution will be most appropriate

12a) Hint 5: Be sure to use the correct number of degrees of freedom, which will be 'n — 1'

12b) Hint 6: Note the statement 'at least 141cm', so balls that are expected to bounce lower than that will be of concern.

12b) Hint 7: Decide where the confidence interval is located, relative to the number 141, and clearly communicate any numerical comparisons that you make.

12b) Hint 8: Decide what that location means for the batch of balls, from which the sample was drawn.

Question 13

13a) Hint 1: Take the suggestion and draw a neat diagram of your choice to help. Your whole solution will depend upon that diagram.

13a) Hint 2: If one diagram doesn't work for you, then try the other diagram - both will be valid approaches to take.

13a) Hint 3: Take great care when putting the given probabilities in the correct place(s) on your diagram. Be careful not to make any incorrect assumptions about events A and B.

13a) Hint 4: Know that the probability required is for 'not A and B', so that means that A does not happen at the same time that B does happen

13b) Hint 5: Don't be put off by the apparent complexity of the symbols!

13b) Hint 6: The event of interest is: 'A and B' or 'not A and not B'. This is the same as: 'both A and B happening' as well as 'neither A nor B happening'

13b) Hint 7: Look on your diagram for these two parts, and then combine them appropriately.

13c) Hint 8: This is a standard conditional probability calculation that is likely to use some of your earlier calculated results

Did this hint help?